In speech science and phonetics, a formant is the broad spectral maximum that results from an acoustic resonance of the Vocal tract.Titze, I.R. (1994). Principles of Voice Production, Prentice Hall, .Titze, I.R., Baken, R.J. Bozeman, K.W., Granqvist, S. Henrich, N., Herbst, C.T., Howard, D.M., Hunter, E.J., Kaelin, D., Kent, R.D., Löfqvist, A., McCoy, S., Miller, D.G., Noé, H., Scherer, R.C., Smith, J.R., Story, B.H., Švec, J.G., Ternström, S. and Wolfe, J. (2015) "Toward a consensus on symbolic notation of harmonics, resonances, and formants in vocalization." J. Acoust. Soc. America. 137, 3005–3007. In acoustics, a formant is usually defined as a broad peak, or local maximum, in the spectrum.Jeans, J.H. (1938) Science & Music, reprinted by Dover, 1968.Standards Secretariat, Acoustical Society of America, (1994). ANSI S1.1-1994 (R2004) American National Standard Acoustical Terminology, (12.41) Acoustical Society of America, Melville, NY. For harmonic sounds, with this definition, the formant frequency is sometimes taken as that of the harmonic that is most augmented by a resonance. The difference between these two definitions resides in whether "formants" characterise the production mechanisms of a sound or the produced sound itself. In practice, the frequency of a spectral peak differs slightly from the associated resonance frequency, except when, by luck, harmonics are aligned with the resonance frequency, or when the sound source is mostly non-harmonic, as in whispering and vocal fry.
A room can be said to have formants characteristic of that particular room, due to its resonances, i.e., to the way sound reflects from its walls and objects. Room formants of this nature reinforce themselves by emphasizing specific frequencies and absorbing others, as exploited, for example, by Alvin Lucier in his piece I Am Sitting in a Room. In acoustic digital signal processing, the way a collection of formants (such as a room) affects a signal can be represented by an impulse response.
In both speech and rooms, formants are characteristic features of the resonances of the space. They are said to be excited by acoustic sources such as the voice, and they shape (filter) the sources' sounds, but they are not sources themselves.
+ Average vowel formants for a male voice (in Hz) |
Formants are distinctive frequency components of the acoustic signal produced by speech, musical instrumentsReuter, Christoph (2009): The role of formant positions and micro-modulations in blending and partial masking of musical instruments. In: Journal of the Acoustical Society of America (JASA), Vol. 126,4, p. 2237 or singing. The information that humans require to distinguish between speech sounds can be represented purely quantitatively by specifying peaks in the frequency spectrum. Most of these formants are produced by tube and chamber resonance, but a few whistle tones derive from periodic collapse of Venturi effect low-pressure zones.
The formant with the lowest frequency is called F1, the second F2, the third F3, and so forth. The fundamental frequency or pitch of the voice is sometimes referred to as F0, but it is not a formant. Most often the two first formants, F1 and F2, are sufficient to identify the vowel. The relationship between the perceived vowel quality and the first two formant frequencies can be appreciated by listening to "artificial vowels" that are generated by passing a click train (to simulate the glottal pulse train) through a pair of bandpass filters (to simulate vocal tract resonances). have higher F2, while have higher F1. Lip rounding tends to lower F1 and F2 in back vowels and F2 and F3 in front vowels.
Nasal consonants usually have an additional formant around 2500 Hz. The liquid usually has an extra formant at 1500 Hz, whereas the English language "r" sound () is distinguished by a very low third formant (well below 2000 Hz).
Plosives (and, to some degree, fricatives) modify the placement of formants in the surrounding vowels. Bilabial sounds (such as and in "ball" or "sap") cause a lowering of the formants; on spectrograms, Velar consonant sounds ( and in English) almost always show F2 and F3 coming together in a 'velar pinch' before the Velar consonant and separating from the same 'pinch' as the velar is released; alveolar sounds (English and ) cause fewer systematic changes in neighbouring vowel formants, depending partially on exactly which vowel is present. The time course of these changes in vowel formant frequencies are referred to as 'formant transitions'.
In normal voiced speech, the underlying vibration produced by the vocal folds resembles a sawtooth wave, rich in harmonic overtones. If the fundamental frequency or (more often) one of the overtones is higher than a resonance frequency of the system, then the resonance will be only weakly excited and the formant usually imparted by that resonance will be mostly lost. This is most apparent in the case of soprano opera singers, who sing at pitches high enough that their vowels become very hard to distinguish.
Control of resonances is an essential component of the vocal technique known as overtone singing, in which the performer sings a low fundamental tone, and creates sharp resonances to select upper harmonics, giving the impression of several tones being sung at once.
may be used to visualise formants. In spectrograms, it can be hard to distinguish formants from naturally occurring harmonics when one sings. However, one can hear the natural formants in a vowel shape through atonal techniques such as vocal fry.
Different methods exist to obtain this information. Formant frequencies, in their acoustic definition, can be estimated from the frequency spectrum of the sound, using a spectrogram (in the figure) or a spectrum analyzer. However, to estimate the acoustic resonances of the vocal tract (i.e. the speech definition of formants) from a speech recording, one can use linear predictive coding. An intermediate approach consists in extracting the spectral envelope by neutralizing the fundamental frequency, and only then looking for local maxima in the spectral envelope.
Vowels will almost always have four or more distinguishable formants, and sometimes more than six. However, the first two formants are the most important in determining vowel quality and are often plotted against each other in vowel diagrams,Deterding, David (1997) 'The Formants of Monophthong Vowels in Standard Southern British English Pronunciation', Journal of the International Phonetic Association, 27, pp. 47–55. though this simplification fails to capture some aspects of vowel quality such as rounding.Hayward, Katrina (2000) Experimental Phonetics, Harlow, UK: Pearson, p. 149.
Many writers have addressed the problem of finding an optimal alignment of the positions of vowels on formant plots with those on the conventional vowel quadrilateral. The pioneering work of Ladefoged used the Mel scale because this scale was claimed to correspond more closely to the auditory scale of pitch than to the acoustic measure of fundamental frequency expressed in Hertz. Two alternatives to the Mel scale are the Bark scale and the ERB-rate scale.
|
|